Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit

نویسندگان

  • E. N. Elnozahy
  • Willy Zwaenepoel
چکیده

Manetho is a new transparent rollback recovery protocol for long running distributed computations It uses a novel combination of antecedence graph maintenance unco ordinated checkpointing and sender based message logging Manetho simultaneously achieves the advantages of pessimistic message logging namely limited rollback and fast output commit and the advantage of optimistic message logging namely low failure free overhead These advantages come at the expense of a complex recovery scheme Index Terms Antecedence graph checkpointing message logging rollback recovery transparent fault tolerance

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation and Performance of Transparent Rollback-recovery in Manetho

We describe the implementation and performance of rollback-recovery in Manetho. During failure-free operation, Manetho maintains an antecedence graph which records the \happened before" relation between certain events in the distributed computation. The antecedence graph is used in combination with checkpointing and volatile sender-based message logging to simultaneously achieve low failure-fre...

متن کامل

Efficient Transparent Optimistic Rollback Recovery for Distributed Application Programs

Existing rollback-recovery methods using consistent checkpointing may cause high overhead for applications that frequently send output to the “outside world,” since a new consistent checkpoint must be written before the output can be committed, whereas existing methods using optimistic message logging may cause large delays in committing output, since processes may buffer received messages arbi...

متن کامل

Survey of Backward Error Recovery Techniques for Multicomputers Based on Checkpointing and Rollback

For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpointing and rollback, is often used. During failurefree operation, the process states are regularly saved, and after a fault is detected, the system is rolled back to a previously saved state. We can distinguish four classes of techniques: semi-automatic techniques, message logging, coordinated ch...

متن کامل

Low-cost Checkpointing-based Rollback Recovery Algorithm Considering Scalability

In this paper, we design a low-cost checkpointing-based rollback recovery algorithm to address the traditional scalability problem of synchronous checkpointing in the completely different point of view compared with existing ones. This algorithm enables a cluster-wide set of processes to take their semi-global checkpointing procedure while a small set of cluster heads monitor local commit of th...

متن کامل

An Application-Transparent, Platform-Independent Approach to Rollback-Recovery for Mobile Agent Systems

This paper proposes a new approach to rollback-recovery for mobile-agent systems, and describes its implementation in the MESSENGERS mobile agents system. The used checkpointing method allows to implement space and time efficient, user-transparent rollback-recovery in heterogeneous distributed environments. Together with an efficient non-blocking system snapshot algorithm this checkpointing met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 41  شماره 

صفحات  -

تاریخ انتشار 1992